DNA was extracted using the Qiagen MagAttract PowerSoil DNA KF kit (Formerly MOBio PowerSoil DNA Kit) using a KingFisher robot. DNA quality was evaluated visually via gel electrophoresis and quantified using a Qubit 3.0 fluorometer (Thermo-Fischer, Waltham, MA, USA). Libraries were prepared using an Illumina Nextera library preparation kit with an in-house protocol (Illumina, San Diego, CA, USA).
Paired-end sequencing (150 bp x 2) was done in a NovaSeq 6000 instrument. Shotgun metagenomic sequence reads were processed with the Sunbeam pipeline. Initial quality evaluation was done using FastQC v0.11.5 1. Processing took part in four steps: adapter removal, read trimming, low-complexity-reads removal, and host-sequence removals. Adapter removal was done using cutadapt v2.6 2. Trimming was done with Trimmomatic v0.36 3 using custom parameters (LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36). Low-complexity sequences were detected with Komplexity v0.3.6 4.
High-quality reads were mapped to the human genome (Telomere-to-Telomere assembly, T2T-CHM13v2.0, GCF_009914755.1) and those that mapped to it were removed from the analysis.
The remaining reads were taxonomically classified using Kraken2 with the PlusPF database version 2022-09-265. For functional profiling, high-quality (filtered) reads were aligned against the SEED database via translated homology search and annotated to Subsystems, or functional levels, 1-3 using Super-Focus 6.
Sequence quality was good and host contamination was generally low. Reads removed because they matched one of the pre-specified host/contaminant genomes are grouped into the ‘Host/Contaminant’ category. Reads removed due to low complexity or quality are grouped into the ‘Low Quality’ category.
Samples from the Kangerlussuaq location had fewer reads than the other two sites, which suggests major differences in the DNA quality among the sites. For that readson, we excluded that site from the statistical analysis (PERMANOVA, differential abundance, and differences in alpha diversity).
Bacteria dominated the communities followed by Eukaryotes, Archaea, and Viruses accounting for 97.9%, 1.271%, 0.617%, and 0.21% respectively. Bacteria were dominated by Proteobacteria, Actinobacteria, and Bacteroidetes.
Taxonomic profiles (species-level) were compared using Bray-Curtis dissimilarities and samples were visualized using non-metric multidimensional scaling (NMDS).
Functional genes were organized into the the SEED functional hierarchy. At broad levels (levels 1 and 2), functional profiles were very similar.
We use a Permutational multivariate analysis variation to estimate the effects of experimental factors on both taxonomic and functional profiles. We found significant differences due to location which explained around 84.5% of the taxonomic profiles variation and 73.8% of the functional profiles variation.
| Df | SumOfSqs | R2 | F | Pr..F. | |
|---|---|---|---|---|---|
| Group | 1 | 0.815 | 0.845 | 43.492 | 0.014 |
| Residual | 8 | 0.150 | 0.155 | NA | NA |
| Total | 9 | 0.966 | 1.000 | NA | NA |
| Df | SumOfSqs | R2 | F | Pr..F. | |
|---|---|---|---|---|---|
| Group | 1 | 0.035 | 0.738 | 22.586 | 0.014 |
| Residual | 8 | 0.013 | 0.262 | NA | NA |
| Total | 9 | 0.048 | 1.000 | NA | NA |
We used Negative binomial models (DESEq2 R package) for differential abundance testing of taxonomic and subsystem level 3 features. We looked for differences between groups and found 3372 significant species and 620 differently abundant functions (adjusted p-values < 0.01).
Alpha diversity was calculated from taxonomic profiles (Species level) using Shannon’s diversity index. We observed significant differences between locations (excluding the outlier, Kangerlussuaq location, Kruskal-Wallis chi-squared = 6.8182, df = 1, p-value = 0.009023).
Bioinformatics Group at the Babraham Institute. Software available at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/↩︎
Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 2015;17:1–3.↩︎
Bolger AM., Lohse M. Trimmomatic: a flexible trimmer for Illumina sequence data. Usadel B. 2014.Bioinformatics.Aug 1;30(15):2114-20. doi:10.1093/bioinformatics/btu170.↩︎
Clarke, E.L., Taylor, L.J., Zhao, C. et al. Sunbeam: an extensible pipeline for analyzing metagenomic sequencing experiments. Microbiome 7, 46 (2019) doi:10.1186/s40168-019-0658-x. https://github.com/eclarke/komplexity.↩︎
Wood, D.E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biol 20, 257 (2019). https://doi.org/10.1186/s13059-019-1891-0↩︎
Silva, G. G. Z., Green K., B. E. Dutilh, and R. A. Edwards: SUPER-FOCUS: A tool for agile functional analysis of shotgun metagenomic data. Bioinformatics. 2015 Oct 9. pii: btv584. Website: https://edwards.sdsu.edu/SUPERFOCUS↩︎